perm filename EVALU[DIS,DBL]3 blob sn#210301 filedate 1976-04-14 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00015 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.NSECP(Evaluating AM)
C00006 00003	.SSEC(Judging Performance)
C00012 00004	. SSSEC(AM's Ultimate Discoveries)
C00014 00005	. SSSEC(The Magnitude of AM's Progress)
C00015 00006	. SSSEC(The Quality of AM's Route)
C00016 00007	. SSSEC(The Character of the User-System Interactions)
C00017 00008	. SSSEC(AM's Intuitive Powers)
C00018 00009	. SSSEC(Experiments on AM)
C00019 00010	. SSSEC(How to Perform Experiments on AM)
C00020 00011	. SSSEC(Future Implications of this Project)
C00021 00012
C00029 00013	.SSEC(Capabilities and Limitations of AM)
C00030 00014	.SSEC(The Role of the Human)
C00031 00015	.SSEC(Summary of Conclusions)
C00032 ENDMK
C⊗;
.NSECP(Evaluating AM)

.TURN ON "{}"

This chapter contains discussions "meta" to AM itself.

First comes an essay about judging the performance of a system like AM.
This is a very hard task, since AM has no "goal". Even using current mathematical
standards, should AM be judged on what it produced, or the quality of the
path which led to those resuls, or the difference between what it started with
and what it finally derived?

Section {SECNUM}.2 deals with the capabilities and limitations of AM.

.B48

   What are some notable omissions in AM's behavior? Can the user elicit these?

   What concepts can be elicited from AM now? Withing a little tuning/tiny additions?

   What could proabably be done within a couple months of modifications?

   Aside from a total change of domain, what kinds of activities does AM lack
   (e.g., proof capabilitites), what concepts and discoveries are beyyond its design
   limitations.

.E


Next is an evaluation of the human engineering features (and humans' reactions).
What is the role of the user, both in actuality and ultimately?

Finally, all the conclusions will be gathered together. The next chapter will
try to generalize those conclusions, to say something about the
general problem of emulating empirical research.

.TURN OFF "{}"

.SSEC(Judging Performance)


One may view AM's activity  as a progression from an initial core of knowledge
to a more sophisticated "final"$$ As has been stressed over and over, AM has no
fixed goal, no "final" state. For practical purposes, however, the totality of
explorations by AM is about the same as the "best run so far"; either of these can be
thought of as defining what is meant by the "final" state of knowledge. $
body of concepts and their facets.
Then each of the following is a reasonable way to measure success, to "judge" AM:


.BN

λλ By AM's ultimate achievements. Examine the list of 
concepts and methods AM developed.
Did AM ever discover anything interesting yet unknown to the user?$$ 
The "user" is a human works with AM interactively, giving it hints, commands,
questions, etc.
Notice that by "new" we mean new to the user, not new to Mankind. 
This might occur if the user were a child, and AM discovered
some elementary facts of arithmetic.
This is not really
so provincial:  mathematicians take "new" to mean new to Mankind, not
new in the Universe.  I feel philosophy slipping in, so this footnote is
terminated. $ Anything new to Mankind?

λλ By the character of the difference between the initial and final states.
Progressing from set theory to number theory is much more impressive than progressing
from two-dimensional geometry to three-dimensional geometry.

λλ By the quality of the route AM took to accomplish these advances:  
How clever, how circuitous,
how many of the detours were quickly identified as such and abandoned?
 
λλ By the character of the User--System interactions: How important is the user's
guidance? How closely must he guide AM? What happens if he doesn't say anything ever?
When he does want to say something, is there an easy way to express that to AM,
and does AM respond well to it?
Given a reasonable kick in the right direction, can AM develop the mini-theories
which the user intended, or at least something equally interesting?

λλ By its intuitive heuristic powers: Does AM believe in "reasonable" conjectures?
How accurately does AM estimate the difficulty of tasks it
is considering?  
Does AM tie together (e.g., as analogous) concepts which are formally unrelated
yet which benefit from such a tie?

λλ By the results of the experiments described in
Section {[2] EXAM2}.{[2] EXPTSSEC}, page {[3] EXPTPAGE}.
How fragile is the worth numbering scheme? The priority of tasks scheme?
How domain-specific are those heuristics really? The set of facets?

λλ By the fact that the kinds of experiments outlined in the next section can
easily be "set up" and performed on AM.
Regardless of the experiments' outcomes, 
the features of AM which allow them to be carried
out at all are worthy of note.

λλ By the implications of this project. What can AM suggest about educating
young mathematicians (and scientists in general)?
What can AM say about doing math (about empirical research in general)?

.E

For each of these measuring criteria, 
a subsection will now be provided, to illustrate (i) a
stunning acheivement and (ii) a stunning failure of AM along each dimension, and
(iii) to
try to objectively characterize AM's performance according to that measure.

. SSSEC(AM's Ultimate Discoveries)

λλ By AM's ultimate achievements. Examine the list of 
concepts and methods AM developed.
Did AM ever discover anything interesting yet unknown to the user?$$ 
The "user" is a human works with AM interactively, giving it hints, commands,
questions, etc.
Notice that by "new" we mean new to the user, not new to Mankind. 
This might occur if the user were a child, and AM discovered
some elementary facts of arithmetic.
This is not really
so provincial:  mathematicians take "new" to mean new to Mankind, not
new in the Universe.  I feel philosophy slipping in, so this footnote is
terminated. $ Anything new to Mankind?


. SSSEC(The Magnitude of AM's Progress)

λλ By the character of the difference between the initial and final states.
Progressing from set theory to number theory is much more impressive than progressing
from two-dimensional geometry to three-dimensional geometry.

. SSSEC(The Quality of AM's Route)

λλ By the quality of the route AM took to accomplish these advances:  
How clever, how circuitous,
how many of the detours were quickly identified as such and abandoned?
 

. SSSEC(The Character of the User-System Interactions)

λλ By the character of the User--System interactions: How important is the user's
guidance? How closely must he guide AM? What happens if he doesn't say anything ever?
When he does want to say something, is there an easy way to express that to AM,
and does AM respond well to it?
Given a reasonable kick in the right direction, can AM develop the mini-theories
which the user intended, or at least something equally interesting?

. SSSEC(AM's Intuitive Powers)

λλ By its intuitive heuristic powers: Does AM believe in "reasonable" conjectures?
How accurately does AM estimate the difficulty of tasks it
is considering?  
Does AM tie together (e.g., as analogous) concepts which are formally unrelated
yet which benefit from such a tie?

. SSSEC(Experiments on AM)

λλ By the results of the experiments described in
Section {[2] EXAM2}.{[2] EXPTSSEC}, page {[3] EXPTPAGE}.
How fragile is the worth numbering scheme? The priority of tasks scheme?
How domain-specific are those heuristics really? The set of facets?

. SSSEC(How to Perform Experiments on AM)

λλ By the fact that the kinds of experiments outlined in the next section can
easily be "set up" and performed on AM.
Regardless of the experiments' outcomes, 
the features of AM which allow them to be carried
out at all are worthy of note.

. SSSEC(Future Implications of this Project)

λλ By the implications of this project. What can AM suggest about educating
young mathematicians (and scientists in general)?
What can AM say about doing math (about empirical research in general)?

.SSEC(Capabilities and Limitations of AM)

.SSEC(The Role of the Human)

.SSEC(Summary of Conclusions)